R Schnittstellen

Jan-Philipp Kolb

8 Mai 2017

Einführung und Motivation

Pluspunkte von R

Gründe

Warum R?

Modularer Aufbau

Modularer Aufbau

Nachteile von R

  1. Daten werden oft anderswo erfasst
  2. Nicht jeder ist bereit mit R zu arbeiten
  3. Nicht auf jedem Rechner ist R installiert
  4. R ist manchmal zu langsam
  5. Schwierigkeiten bei der Arbeit mit großen Datenmengen

Was folgt daraus

  1. Schnittstelle zu SPSS/Stata/Excel zum Import von Daten
  2. Schnittstelle zu Word
  3. Möglichkeit HTML Präsentationen zu erzeugen
  4. Nutzung von C++
  5. Nutzung von Datenbanken

Die Nutzung von Schnittstellen beim Import/Export

Import

Import

Reproducible Research

Was wird bei Wikipedia unter Reproducability verstanden?

Darstellung von Ergebnissen

Warum die Schnittstelle zu C++?

Die Nutzung von Datenbanken

Nutzung der Unterlagen auf GitHub

Wie wird das Github Verzeichnis genutzt?

https://github.com/Japhilko/RInterfaces

Informationen ausdrucken

Raw Button zum Download

Raw Button zum Download

Weitere Dateien herunterladen

Organisatorisches

Wen Github näher interessiert:

CRAN Task Views

Aufgabe - Zusatzpakete

Gehen Sie auf https://cran.r-project.org/ und suchen Sie in dem Bereich, wo die Pakete vorgestellt werden, nach Paketen,…

Datenimport

Dateiformate in R

Formate - base package

R unterstützt von Haus aus schon einige wichtige Formate:

Datenimport leicht gemacht mit Rstudio

Import Button

Import Button

CSV aus dem Web einladen

https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD

Der Arbeitsspeicher

So findet man heraus, in welchem Verzeichnis man sich gerade befindet

getwd()

So kann man das Arbeitsverzeichnis ändern:

Man erzeugt ein Objekt in dem man den Pfad abspeichert:

main.path <- "C:/" # Beispiel für Windows
main.path <- "/users/Name/" # Beispiel für Mac
main.path <- "/home/user/" # Beispiel für Linux

Und ändert dann den Pfad mit setwd()

setwd(main.path)

Bei Windows ist es wichtig Slashs anstelle von Backslashs zu verwenden.

Alternative - Arbeitsspeicher

Das Paket readr

install.packages("readr")
library(readr)

Import von Excel-Daten

library(readr)
rows <- read_csv("https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD")

.csv-Daten aus dem Web importieren - zweites Beispiel

url <- "https://raw.githubusercontent.com/Japhilko/
GeoData/master/2015/data/whcSites.csv"

whcSites <- read.csv(url) 
head(data.frame(whcSites$name_en,whcSites$category))
##                                                      whcSites.name_en
## 1 Cultural Landscape and Archaeological Remains of the Bamiyan Valley
## 2                           Minaret and Archaeological Remains of Jam
## 3                          Historic Centres of Berat and Gjirokastra 
## 4                                                             Butrint
## 5                                             Al Qal'a of Beni Hammad
## 6                                                        M'Zab Valley
##   whcSites.category
## 1          Cultural
## 2          Cultural
## 3          Cultural
## 4          Cultural
## 5          Cultural
## 6          Cultural

Das Paket haven

install.packages("haven")
library(haven)

SPSS Dateien einlesen

install.packages("haven")
library(haven)
mtcars <- read_sav("https://github.com/Japhilko/RInterfaces/raw/master/data/mtcars.sav")

stata Dateien einlesen

library(haven)
oecd <- read_dta("https://github.com/Japhilko/IntroR/raw/master/2017/data/oecd.dta")

Datenexport

Die Exportformate von R

Beispieldatensatz erzeugen

A <- c(1,2,3,4)
B <- c("A","B","C","D")

mydata <- data.frame(A,B)

Überblick Daten Import/Export

save(mydata, file="mydata.RData")

Daten in .csv Format abspeichern

write.csv(mydata,file="mydata.csv") 
write.csv2(mydata,file="mydata.csv") 

Das Paket xlsx

library(xlsx)
write.xlsx(mydata,file="mydata.xlsx") 

Das Paket foreign

Daten in stata Format abspeichern

library(foreign)
write.dta(mydata,file="data/mydata.dta") 

Das Paket rio

install.packages("rio")

Daten als .sav abspeichern (SPSS)

library("rio")
# create file to convert

export(mtcars, "data/mtcars.sav")

Dateiformate konvertieren

export(mtcars, "data/mtcars.dta")

# convert Stata to SPSS
convert("data/mtcars.dta", "data/mtcars.sav")

R und Excel

Das Paket xlsx

library("xlsx")
dat <- read.xlsx("cult_emp_sex.xls",1)

Einige Schritte um R und Excel zu verbinden

install.packages("XLConnect")
library("XLConnect")
Vignette für XLconnect

Vignette für XLconnect

Eine Excel Datei aus R erzeugen

fileXls <- "data/newFile.xlsx"
unlink(fileXls, recursive = FALSE, force = FALSE)
exc <- loadWorkbook(fileXls, create = TRUE)
createSheet(exc,'Input')
saveWorkbook(exc)

Das Arbeitsblatt mit Daten befüllen

input <- data.frame('inputType'=c('Day','Month'),'inputValue'=c(2,5))
writeWorksheet(exc, input, sheet = "input", startRow = 1, startCol = 2)
saveWorkbook(exc)

BERT - Eine weitere Verbindung zwischen R und Excel

myFunction <- function(){
 aa <- rnorm(200)
 bb <- rnorm(200)
 res <- lm(aa~bb)$res
 return(res)
}

Das Paket readxl

install.packages("readxl")
library(readxl)

Präsentation von Daten - Reproducible Research

CRAN Taskview zu reproducible research

A crash course in reproducible research in R

Word Dokumente mit R erstellen

Ein Markdown Dokument mit Rstudio erzeugen

Mein erstes mit R erzeugtes Word Dokument

Erstes Beispiel

Das Arbeiten mit Markdown

Rmarkdown - erste Schritte

Markdown ist eine sehr einfache Syntax, die es Benutzern erlaubt, aus einfachen Textdateien gut gelayoutete Dokumente zu erstellen.

**fettes Beispiel**
*kursives Beispiel*
~~durchgestrichen~~
- Aufzählungspunkt

fettes Beispiel

kursives Beispiel

durchgestrichen

Weitere Markdown Befehle

### Überschrift Ebene 3
#### Überschrift Ebene 4
[Meine Github Seite](https://github.com/Japhilko)

Überschrift Ebene 3

Überschrift Ebene 4

Meine Github Seite

Weitere Markdown Befehle

![BSP](http://e-scientifics.de/content/example_kinderbild.jpg)
![BSP 2](figure/example.png)

Chunks - Erste Schritte

Button um Chunks einzufügen

Inline Code

n=100

Ein inline Codeblock: 100

Chunk Optionen

Argument Beschreibung
eval Soll Rcode evaluiert werden?
warning Sollen Warnings angezeigt werden?
cache Soll der Output gespeichert werden?

Optionen

Optionen für Word Output

Code Hervorhebung

Das Paket knitr

install.packages("knitr")
library("knitr")

Eine Tabelle mit kable erzeugen

a <- runif(10)
b <- rnorm(10)
ab <- cbind(a,b)
kable(ab)
a b
0.9502345 -0.2052916
0.1828050 0.3958346
0.3372943 0.5432295
0.5797910 1.4856080
0.0764772 -0.7431788
0.4001664 -0.9645617
0.2819558 -1.0052538
0.0207996 1.4749363
0.8164361 0.4943767
0.5191753 0.8588139

Vorlagen verwenden

  1. Ein Word Dokument mit Rmarkdown erstellen
  2. Das Dokument in Word öffnen und Format verändern
  3. Vorlage als Referenz angeben

Immer das aktuelle Datum im Kopf

date: "07 Mai, 2017"

Resourcen

PDF Dokumente und Präsentationen mit LaTeX, Beamer und Sweave

Präsentationen mit Rmarkdown - beamer Präsentationen

Beamer Optionen

Beamer Themen

Chunks einfügen

Ergebnis - Cache

Wie man das im Header des Dokuments angibt

---
title: "Intro - Erste Schritte"
author: "Jan-Philipp Kolb"
date: "10 April 2017"
output:
  beamer_presentation: 
    colortheme: beaver
    theme: CambridgeUS
---

Inhaltsverzeichnis I

Inhaltsverzeichnis II

output: 
  beamer_presentation: 
    toc: yes

Optionen für die Graphikeinbindung

Präsentationen mit Sweave

Sweave Präsentation

Chunks bei Sweave

Chunk Optionen

Inline Code

\Sexpr{}

Inline Code - das Ergebnis

PDF Paper mit R

Jabref

Referenz mit R bekommen

install.packages("RMySQL")
citation("RMySQL")
## 
## To cite package 'RMySQL' in publications use:
## 
##   Jeroen Ooms, David James, Saikat DebRoy, Hadley Wickham and
##   Jeffrey Horner (2017). RMySQL: Database Interface and 'MySQL'
##   Driver for R. R package version 0.10.11.
##   https://CRAN.R-project.org/package=RMySQL
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RMySQL: Database Interface and 'MySQL' Driver for R},
##     author = {Jeroen Ooms and David James and Saikat DebRoy and Hadley Wickham and Jeffrey Horner},
##     year = {2017},
##     note = {R package version 0.10.11},
##     url = {https://CRAN.R-project.org/package=RMySQL},
##   }

Das bibtex file einbinden I

Das bibtex file einbinden II

---
title: "R Schnittstellen"
author: "Jan-Philipp Kolb"
date: "21 April 2017"
output: 
  pdf_document: default
bibliography: Rschnittstellen.bib
---

Das Ergebnis

HTML Dokumente, Präsentationen und Dashboards mit Rmarkdown

Präsentationen - Rpres der einfachste Weg

Eine erste Präsentation

Erste Daten eintragen

date()
## [1] "Sun May 07 14:34:41 2017"

Eine Folie mit Formel

$$
\begin{equation}\label{eq2}
t_{i}=\sum\limits_{k=1}^{M_{i}}{y_{ik}}=M_{i}\bar{Y}_{i}. 
\end{equation}
$$

Zwei Spalten

Folie mit zwei Spalten
====================================
Erste Spalte
***
Zweite Spalte

Folienübergänge

transition: rotate

Weitere mögliche Folienübergänge

Folientypen

Ein neues Kapitel einfügen
====================================
type: section
Anderer Folientyp
====================================
type: prompt
Noch ein anderer Folientyp
====================================
type: alert

Die Schriftart wechseln

Meine Präsentation
========================================
author: Jan-Philipp Kolb
font-family: 'Impact'

Schrifttypen können auch importiert werden

Meine Präsentation
========================================
author: Jan-Philipp Kolb
font-import: http://fonts.googleapis.com/css?family=Risque
font-family: 'Risque'

Kleineren Text

Normale Schriftgröße

<small>This sentence will appear smaller.</small>

Die Präsentation anschauen

http://rpubs.com/Japhilko82/FirstRpubs

Eine ioslides Präsentation

Eine ioslides Präsentation

ioslides - Der Start

Weitere Dinge tun

![picture of spaghetti](images/spaghetti.jpg)

Ein Logo hinzu

---
title: "ioslides Beispiel"
author: "Jan-Philipp Kolb"
date: "20 April 2017"
output: 
  ioslides_presentation:
    logo: figure/Rlogo.png
---

Tabellen

library(knitr)
a <- data.frame(a=1:10,b=10:1)
kable(table(a))
1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0

knitr Engines

Eine slidy Präsentation

slidy Präsentationen

Was sind Cascading Style Files ([CSS](https://en.wikipedia.org/wiki/Cascading_Style_Sheets))?

CSS und R

Beispiel CSS

Das CSS ändern

Um den Präsentationstyp zu ändern kann man das CSS verändern

HTML Dokumente

Ein HTML Dokument erzeugen

Ein Template verwenden

Weitere Vorlagen nutzen

install.packages("rticles")

Vorlagen für Markdown

Das Paket rmdformats - HTML Output Formats and Templates for ‘rmarkdown’

install.packages("rmdformats")
install.packages("ProjectTemplate")
install.packages("tufte")

Beispiele für Templates

Dashboards

Beispiel R-Pakete

Paket installieren

install.packages("flexdashboard", type = "source")

Ein Dashboard erstellen mit Rstudio

Mein erstes Dashboard

Gallerie

Notebooks zur Integration von anderen Programmiersprachen (Python,LaTeX,Julia)

Notebooks

Rnotebooks

Ein Rnotebook anlegen

Rnotebook - erste Schritte

Python Code integrieren

import sys
print(sys.version)
## 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)]

LaTeX Code integieren

Notebook veröffentlichen I

Notebook veröffentlichen II

Andere Notebooks

Jupyter Notebook

jupyter notebook

Start Jupyter Notebook

Beispiel Eingabe Code

Beaker Notebook

Beaker Notebook

Beaker starten

Aufgabe: Bearbeiten Sie ein Notebook weiter

Interaktive Karten mit dem Javascript Paket leaflet

Die Daten - Weltkulturerbe

url <- "https://raw.githubusercontent.com/Japhilko/
GeoData/master/2015/data/whcSites.csv"

whcSites <- read.csv(url) 
whcSitesDat <- with(whcSites,data.frame(name_en,
                                        category))
library(knitr)
kable(head(whcSitesDat))
name_en category
Cultural Landscape and Archaeological Remains of the Bamiyan Valley Cultural
Minaret and Archaeological Remains of Jam Cultural
Historic Centres of Berat and Gjirokastra Cultural
Butrint Cultural
Al Qal’a of Beni Hammad Cultural
M’Zab Valley Cultural

Das Paket DT

install.packages("DT")

Weitere Variablen WHC Datensatz

whcSitesDat2 <- with(whcSites,data.frame(name_en,category,longitude,latitude,date_inscribed,area_hectares,danger_list))
library('DT')
datatable(whcSitesDat2)

Das Ergebnis bei Rpubs

http://rpubs.com/Japhilko82/WHCdata

Das Paket magrittr

install.packages("magrittr")
library("magrittr")

Die Pipes nutzen

library(magrittr)

str1 <- "Hallo Welt"
str1 %>% substr(1,5)
## [1] "Hallo"
str1 %>% substr(1,5) %>% toupper()
## [1] "HALLO"

Das Paket leaflet

install.packages("leaflet")
library("leaflet")

Was sind Tiles

Eine interaktive Karte erstellen

m <- leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(lng=whcSites$lon, 
             lat=whcSites$lat, 
             popup=whcSites$name_en)
m

Die Karte zeigen

Farbe hinzu

whcSites$color <- "red"
whcSites$color[whcSites$category=="Cultural"] <- "blue"
whcSites$color[whcSites$category=="Mixed"] <- "orange"

Eine Karte mit Farbe erzeugen

m1 <- leaflet() %>%
  addTiles() %>%  
  addCircles(lng=whcSites$lon, 
             lat=whcSites$lat, 
             popup=whcSites$name_en,
             color=whcSites$color)

Die Karte mit mehr Farbe

Weltkulturerbe

Weltkulturerbe

Die Karte abspeichern

Layers ein- und ausblenden

m2 <- leaflet() %>%
  addTiles(group = "OSM (default)") %>%  
  addProviderTiles("Stamen.Toner", group = "Toner") %>%
  addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%

  addCircles(lng=whcSites$lon, 
             lat=whcSites$lat, 
             popup=whcSites$name_en) %>% 
  
  addLayersControl(
    baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
    options = layersControlOptions(collapsed = FALSE)
  )
m2

Ein weiteres Beispiel mit Erdbebendaten

outline <- quakes[chull(quakes$long, quakes$lat),]
map <- leaflet(quakes) %>%
  # Base groups
  addTiles(group = "OSM (default)") %>%
  addProviderTiles("Stamen.Toner", group = "Toner") %>%
  addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%
  # Overlay groups
  addCircles(~long, ~lat, ~10^mag/5, stroke = F, group = "Quakes") %>%
  addPolygons(data = outline, lng = ~long, lat = ~lat,
    fill = F, weight = 2, color = "#FFFFCC", group = "Outline") %>%
  # Layers control
  addLayersControl(
    baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
    overlayGroups = c("Quakes", "Outline"),
    options = layersControlOptions(collapsed = FALSE)
  )
map

Karte mit Polygonen erzeugen

library(sp)
Sr1 = Polygon(cbind(c(2, 4, 4, 1, 2), c(2, 3, 5, 4, 2)))
Sr2 = Polygon(cbind(c(5, 4, 2, 5), c(2, 3, 2, 2)))
Sr3 = Polygon(cbind(c(4, 4, 5, 10, 4), c(5, 3, 2, 5, 5)))
Sr4 = Polygon(cbind(c(5, 6, 6, 5, 5), c(4, 4, 3, 3, 4)), hole = TRUE)
Srs1 = Polygons(list(Sr1), "s1")
Srs2 = Polygons(list(Sr2), "s2")
Srs3 = Polygons(list(Sr4, Sr3), "s3/4")
SpP = SpatialPolygons(list(Srs1, Srs2, Srs3), 1:3)
leaflet(height = "300px") %>% addPolygons(data = SpP)

Beispiel US Staaten

library(maps)
mapStates = map("state", fill = TRUE, plot = FALSE)
leaflet(data = mapStates) %>% addTiles() %>%
  addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)

Der Befehl setView

Die Basiskarte ändern

m <- leaflet() %>% setView(lng = -71.0589, lat = 42.3601, zoom = 12)
m %>% addTiles()
m %>% addProviderTiles("Stamen.Toner")

Basiskarte - CartoDB

m %>% addProviderTiles("CartoDB.Positron")

Esri.NatGeoWorldMap

m %>% addProviderTiles("Esri.NatGeoWorldMap")

OpenTopoMap

m %>% addProviderTiles("OpenTopoMap")

Thunderforest.OpenCycleMap

m %>% addProviderTiles("Thunderforest.OpenCycleMap")

WMS Tiles hinzufügen

leaflet() %>% addTiles() %>% setView(-93.65, 42.0285, zoom = 4) %>%
  addWMSTiles(
    "http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi",
    layers = "nexrad-n0r-900913",
    options = WMSTileOptions(format = "image/png", transparent = TRUE),
    attribution = "Weather data © 2012 IEM Nexrad"
  )

Mehrere Layer miteinander kombinieren

m %>% addProviderTiles("MtbMap") %>%
  addProviderTiles("Stamen.TonerLines",
    options = providerTileOptions(opacity = 0.35)) %>%
  addProviderTiles("Stamen.TonerLabels")

Andere Marker benutzen

greenLeafIcon <- makeIcon(
  iconUrl = "http://leafletjs.com/examples/custom-icons/leaf-green.png",
  iconWidth = 38, iconHeight = 95,
  iconAnchorX = 22, iconAnchorY = 94,
  shadowUrl = "http://leafletjs.com/examples/custom-icons/leaf-shadow.png",
  shadowWidth = 50, shadowHeight = 64,
  shadowAnchorX = 4, shadowAnchorY = 62
)

leaflet(data = quakes[1:4,]) %>% addTiles() %>%
  addMarkers(~long, ~lat, icon = greenLeafIcon)

Andere Icons einfügen

menIcon <- makeIcon("https://img.clipartfest.com/707b339dc88f57bbd5d88377891131e3_bean-people-clipart-cliparts-beach-screen-with-people-clipart_344-432.jpeg",
         iconWidth = 38, iconHeight = 95,
  iconAnchorX = 22, iconAnchorY = 94)

leaflet(data = quakes[1:4,]) %>% addTiles() %>%
  addMarkers(~long, ~lat, icon = menIcon)

Cluster Optionen für Marker

leaflet(quakes) %>% addTiles() %>% addMarkers(
  clusterOptions = markerClusterOptions()
)

Ein Rechteck hinzufügen

leaflet() %>% addTiles() %>%
  addRectangles(
    lng1=-118.456554, lat1=34.078039,
    lng2=-118.436383, lat2=34.062717,
    fillColor = "transparent"
  )

Interaktive Tabellen mit DataTables

The R-package DT

install.packages('DT')
library('DT')
exdat <- read.csv("data/exdat.csv")
datatable(exdat)

Beispiel für interaktive Tabelle

Hier ist das Ergebnis - Beispiel für eine interaktive Tabelle

Default Optionen verändern

datatable(head(exdat, 20), options = list(
  columnDefs = list(list(className = 'dt-center', targets = 5)),
  pageLength = 5,
  lengthMenu = c(5, 10, 15, 20)
))

Suchoptionen kennzeichnen

datatable(exdat, options = list(searchHighlight = TRUE), filter = 'top')

R und die Javascript Data-Driven Documents (D3)

JavaScript - Data-Driven Documents

gigvis

install.packages("ggvis")
library("ggvis")
library(dplyr)

Kochbuch für ggvis

mtcars %>% ggvis(~wt, ~mpg) %>% layer_points()

Plots mit Gruppierung

mtcars %>% 
  ggvis(~wt, ~mpg, fill = ~factor(cyl)) %>% 
  layer_points() %>% 
  group_by(cyl) %>% 
  layer_model_predictions(model = "lm")

Interaktive Graphiken mit ggvis

mtcars %>%
  ggvis(~wt, ~mpg) %>%
  layer_smooths(span = input_slider(0.5, 1, value = 1)) %>%
  layer_points(size := input_slider(100, 1000, value = 100))

googleVis

install.packages("googleVis")
library(googleVis)

Ein Datensatz mit Früchten

library(DT)
datatable(Fruits)

Beispiel mit googleVis

plot(gvisMotionChart(Fruits, "Fruit", "Year", options = list(width = 600, height = 400)))

Ein weiterer Beispieldatensatz

df <- data.frame(year=1:11, x=1:11,
                 x.scope=c(rep(TRUE, 8), rep(FALSE, 3)),
                 y=11:1, y.html.tooltip=LETTERS[11:1],                 
                 y.certainty=c(rep(TRUE, 5), rep(FALSE, 6)),
                 y.emphasis=c(rep(FALSE, 4), rep(TRUE, 7)))

Ein weiteres Beispiel für googleVis

plot(gvisScatterChart(df,options=list(lineWidth=2)))

Click me

install.packages("devtools")
library(devtools)

install_github("clickme", "nachocab")

Einfaches Beispiel mit clickme

library(clickme)
clickme("points", 1:10)

Ein weiteres clickme Beispiel

n <- 500
clickme("points",
    x = rbeta(n, 1, 10), y = rbeta(n, 1, 10),
    names = sample(letters, n, r = T),
    color_groups = sample(LETTERS[1:3], n, r = T),
    title = "Zoom Search Hover Click")

Das Paket networkD3

install.packages("networkD3")

Ein Beispiel mit networkD3

library(networkD3)
src <- c("A", "A", "A", "A","B", "B", "C", "C", "D")
target <- c("B", "C", "D", "J","E", "F", "G", "H", "I")
networkData <- data.frame(src, target)
simpleNetwork(networkData)

Zeitreihen interaktiv darstellen mit dygraphs

library(dygraphs)
dygraph(nhtemp, main = "New Haven Temperatures") %>% 
  dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))

Das Paket threejs

install.packages("threejs")
library(threejs)
z <- seq(-10, 10, 0.01)
x <- cos(z)
y <- sin(z)
scatterplot3js(x,y,z, color=rainbow(length(z)))
install.packages("Rook")

Interaktive Graphiken mit D3 und plotly

plotly

plotly und R

plotly Installieren

install.packages("plotly")
library("plotly")

Der Anfang mit plotly für R

plot_ly(midwest, x = ~percollege, color = ~state, type = "box")

plotly Beispiel mit eigenen Daten

url <- "https://raw.githubusercontent.com/Japhilko/GeoData/master/2015/data/whcSites.csv"
whcSites <- read.csv(url) 
plot_ly(whcSites, x = ~date_inscribed, color = ~category_short, type = "box")

Netzwerkgraphiken mit vis.js

Einführung in visNetwork

install.packages("visNetwork")
library(visNetwork)

Ein Minimalbeispiel

nodes <- data.frame(id = 1:3)
edges <- data.frame(from = c(1,2), to = c(1,3))
visNetwork(nodes, edges, width = "100%")

Wie es funktioniert

visDocumentation()
vignette("Introduction-to-visNetwork") # with CRAN version

shiny Beispiel

install.packages("shiny")
shiny::runApp(system.file("shiny", package = "visNetwork"))

Das Erstellen von Ablaufdiagrammen mit mermaid

Um was geht es?

install.packages('DiagrammeR')
library('DiagrammeR')

Eine einfache Grafik erzeugen

DiagrammeR("
  graph LR
    A-->B
    A-->C
    C-->E
    B-->D
    C-->D
    D-->F
    E-->F
")

Ein GANTT Diagramm erstellen

DiagrammeR("
gantt
        dateFormat  YYYY-MM-DD
        title Adding GANTT diagram functionality to mermaid
        section A section
        Completed task            :done,    des1, 2014-01-06,2014-01-08
        Active task               :active,  des2, 2014-01-09, 3d
        Future task               :         des3, after des2, 5d
        Future task2               :         des4, after des3, 5d
        section Critical tasks
        Completed task in the critical line :crit, done, 2014-01-06,24h
        Implement parser and jison          :crit, done, after des1, 2d
        Create tests for parser             :crit, active, 3d
        Future task in critical line        :crit, 5d
        Create tests for renderer           :2d
        Add to mermaid                      :1d
")

Ein weiteres Gantt Diagramm

library(DiagrammeR)
mermaid("
gantt
dateFormat  YYYY-MM-DD
title A Very Nice Gantt Diagram

section Basic Tasks
This is completed             :done,          first_1,    2014-01-06, 2014-01-08
This is active                :active,        first_2,    2014-01-09, 3d
Do this later                 :               first_3,    after first_2, 5d
Do this after that            :               first_4,    after first_3, 5d

section Important Things
Completed, critical task      :crit, done,    import_1,   2014-01-06,24h
Also done, also critical      :crit, done,    import_2,   after import_1, 2d
Doing this important task now :crit, active,  import_3,   after import_2, 3d
Next critical task            :crit,          import_4,   after import_3, 5d

section The Extras
First extras                  :active,        extras_1,   after import_4,  3d
Second helping                :               extras_2,   after extras_1, 20h
More of the extras            :               extras_3,   after extras_1, 48h
")

Internetresourcen und Schnittstellen nutzen

Was sind API’s?

Programmierschnittstellen

Bedeutung

JavaScript Object Notation

Das GeoJSON Format

Die Struktur der Daten kann man sich mit einem JSON Viewer anschauen

GeoJSON

OpenStreetMap Daten

Beispiele für GeoJSON

Import von JSON-Objekten und XML Dateien

Import von JavaScript Object Notation (JSON)

Download von Beispieldaten

https://overpass-turbo.eu/

Exkurs OpenStreetMap Daten

Das Paket jsonlite

install.packages("jsonlite")
library(jsonlite)
citation("jsonlite")
## 
## To cite jsonlite in publications use:
## 
##   Jeroen Ooms (2014). The jsonlite Package: A Practical and
##   Consistent Mapping Between JSON Data and R Objects.
##   arXiv:1403.2805 [stat.CO] URL http://arxiv.org/abs/1403.2805.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects},
##     author = {Jeroen Ooms},
##     journal = {arXiv:1403.2805 [stat.CO]},
##     year = {2014},
##     url = {http://arxiv.org/abs/1403.2805},
##   }

JSON importieren

library("jsonlite")
DRINKWATER <- fromJSON("data/RomDrinkingWater.geojson")
names(DRINKWATER)[1:3]
## [1] "type"      "generator" "copyright"
names(DRINKWATER)[4:5]
## [1] "timestamp" "features"

Die Daten anschauen

head(DRINKWATER$features)
##      type             id properties.@id properties.amenity properties.flow
## 1 Feature node/246574149 node/246574149     drinking_water     push-button
## 2 Feature node/246574150 node/246574150     drinking_water            <NA>
## 3 Feature node/246574151 node/246574151     drinking_water            <NA>
## 4 Feature node/248743324 node/248743324     drinking_water            <NA>
## 5 Feature node/251773348 node/251773348     drinking_water            <NA>
## 6 Feature node/251773551 node/251773551     drinking_water            <NA>
##   properties.type properties.name properties.name:fr properties.wheelchair
## 1          nasone            <NA>               <NA>                  <NA>
## 2            <NA>            <NA>               <NA>                  <NA>
## 3            <NA>            <NA>               <NA>                  <NA>
## 4            <NA>            <NA>               <NA>                  <NA>
## 5          nasone            <NA>               <NA>                  <NA>
## 6            <NA>    Acqua Marcia        Eau potable                   yes
##   properties.created_by properties.indoor geometry.type
## 1                  <NA>              <NA>         Point
## 2                  <NA>              <NA>         Point
## 3                  <NA>              <NA>         Point
## 4                  <NA>              <NA>         Point
## 5                  <NA>              <NA>         Point
## 6                  <NA>              <NA>         Point
##   geometry.coordinates
## 1   12.49191, 41.89479
## 2   12.49095, 41.89489
## 3   12.48774, 41.89450
## 4   12.48773, 41.89354
## 5   12.48529, 41.88539
## 6   12.48386, 41.89332

Github JSON Daten

my_repos <- fromJSON("https://api.github.com/users/japhilko/repos")
head(my_repos)
##         id                      name                          full_name
## 1 29143362 2015-01-15-EMBLHeidelberg Japhilko/2015-01-15-EMBLHeidelberg
## 2 39427013              DataAnalysis              Japhilko/DataAnalysis
## 3 26485588            DataGeneration            Japhilko/DataGeneration
## 4 26164276                DLR_IntroR                Japhilko/DLR_IntroR
## 5 20760765                   GeoData                   Japhilko/GeoData
## 6 55756271                 geosmdata                 Japhilko/geosmdata
##   owner.login owner.id
## 1    Japhilko  7593396
## 2    Japhilko  7593396
## 3    Japhilko  7593396
## 4    Japhilko  7593396
## 5    Japhilko  7593396
## 6    Japhilko  7593396
##                                       owner.avatar_url owner.gravatar_id
## 1 https://avatars2.githubusercontent.com/u/7593396?v=3                  
## 2 https://avatars2.githubusercontent.com/u/7593396?v=3                  
## 3 https://avatars2.githubusercontent.com/u/7593396?v=3                  
## 4 https://avatars2.githubusercontent.com/u/7593396?v=3                  
## 5 https://avatars2.githubusercontent.com/u/7593396?v=3                  
## 6 https://avatars2.githubusercontent.com/u/7593396?v=3                  
##                               owner.url              owner.html_url
## 1 https://api.github.com/users/Japhilko https://github.com/Japhilko
## 2 https://api.github.com/users/Japhilko https://github.com/Japhilko
## 3 https://api.github.com/users/Japhilko https://github.com/Japhilko
## 4 https://api.github.com/users/Japhilko https://github.com/Japhilko
## 5 https://api.github.com/users/Japhilko https://github.com/Japhilko
## 6 https://api.github.com/users/Japhilko https://github.com/Japhilko
##                               owner.followers_url
## 1 https://api.github.com/users/Japhilko/followers
## 2 https://api.github.com/users/Japhilko/followers
## 3 https://api.github.com/users/Japhilko/followers
## 4 https://api.github.com/users/Japhilko/followers
## 5 https://api.github.com/users/Japhilko/followers
## 6 https://api.github.com/users/Japhilko/followers
##                                            owner.following_url
## 1 https://api.github.com/users/Japhilko/following{/other_user}
## 2 https://api.github.com/users/Japhilko/following{/other_user}
## 3 https://api.github.com/users/Japhilko/following{/other_user}
## 4 https://api.github.com/users/Japhilko/following{/other_user}
## 5 https://api.github.com/users/Japhilko/following{/other_user}
## 6 https://api.github.com/users/Japhilko/following{/other_user}
##                                         owner.gists_url
## 1 https://api.github.com/users/Japhilko/gists{/gist_id}
## 2 https://api.github.com/users/Japhilko/gists{/gist_id}
## 3 https://api.github.com/users/Japhilko/gists{/gist_id}
## 4 https://api.github.com/users/Japhilko/gists{/gist_id}
## 5 https://api.github.com/users/Japhilko/gists{/gist_id}
## 6 https://api.github.com/users/Japhilko/gists{/gist_id}
##                                              owner.starred_url
## 1 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
## 2 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
## 3 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
## 4 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
## 5 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
## 6 https://api.github.com/users/Japhilko/starred{/owner}{/repo}
##                               owner.subscriptions_url
## 1 https://api.github.com/users/Japhilko/subscriptions
## 2 https://api.github.com/users/Japhilko/subscriptions
## 3 https://api.github.com/users/Japhilko/subscriptions
## 4 https://api.github.com/users/Japhilko/subscriptions
## 5 https://api.github.com/users/Japhilko/subscriptions
## 6 https://api.github.com/users/Japhilko/subscriptions
##                      owner.organizations_url
## 1 https://api.github.com/users/Japhilko/orgs
## 2 https://api.github.com/users/Japhilko/orgs
## 3 https://api.github.com/users/Japhilko/orgs
## 4 https://api.github.com/users/Japhilko/orgs
## 5 https://api.github.com/users/Japhilko/orgs
## 6 https://api.github.com/users/Japhilko/orgs
##                               owner.repos_url
## 1 https://api.github.com/users/Japhilko/repos
## 2 https://api.github.com/users/Japhilko/repos
## 3 https://api.github.com/users/Japhilko/repos
## 4 https://api.github.com/users/Japhilko/repos
## 5 https://api.github.com/users/Japhilko/repos
## 6 https://api.github.com/users/Japhilko/repos
##                                         owner.events_url
## 1 https://api.github.com/users/Japhilko/events{/privacy}
## 2 https://api.github.com/users/Japhilko/events{/privacy}
## 3 https://api.github.com/users/Japhilko/events{/privacy}
## 4 https://api.github.com/users/Japhilko/events{/privacy}
## 5 https://api.github.com/users/Japhilko/events{/privacy}
## 6 https://api.github.com/users/Japhilko/events{/privacy}
##                               owner.received_events_url owner.type
## 1 https://api.github.com/users/Japhilko/received_events       User
## 2 https://api.github.com/users/Japhilko/received_events       User
## 3 https://api.github.com/users/Japhilko/received_events       User
## 4 https://api.github.com/users/Japhilko/received_events       User
## 5 https://api.github.com/users/Japhilko/received_events       User
## 6 https://api.github.com/users/Japhilko/received_events       User
##   owner.site_admin private
## 1            FALSE   FALSE
## 2            FALSE   FALSE
## 3            FALSE   FALSE
## 4            FALSE   FALSE
## 5            FALSE   FALSE
## 6            FALSE   FALSE
##                                                html_url
## 1 https://github.com/Japhilko/2015-01-15-EMBLHeidelberg
## 2              https://github.com/Japhilko/DataAnalysis
## 3            https://github.com/Japhilko/DataGeneration
## 4                https://github.com/Japhilko/DLR_IntroR
## 5                   https://github.com/Japhilko/GeoData
## 6                 https://github.com/Japhilko/geosmdata
##                                        description  fork
## 1  R programming and development (EMBL, Jan 2015)   TRUE
## 2                     My research on data analysis FALSE
## 3              Rcode for generating synthatic data FALSE
## 4                      Unterlagen für DLR Workshop FALSE
## 5               Research on statistics and geodata FALSE
## 6             package to import OpenstreetMap data FALSE
##                                                               url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg
## 2              https://api.github.com/repos/Japhilko/DataAnalysis
## 3            https://api.github.com/repos/Japhilko/DataGeneration
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR
## 5                   https://api.github.com/repos/Japhilko/GeoData
## 6                 https://api.github.com/repos/Japhilko/geosmdata
##                                                               forks_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/forks
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/forks
## 3            https://api.github.com/repos/Japhilko/DataGeneration/forks
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/forks
## 5                   https://api.github.com/repos/Japhilko/GeoData/forks
## 6                 https://api.github.com/repos/Japhilko/geosmdata/forks
##                                                                        keys_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/keys{/key_id}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/keys{/key_id}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/keys{/key_id}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/keys{/key_id}
## 5                   https://api.github.com/repos/Japhilko/GeoData/keys{/key_id}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/keys{/key_id}
##                                                                              collaborators_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/collaborators{/collaborator}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/collaborators{/collaborator}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/collaborators{/collaborator}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/collaborators{/collaborator}
## 5                   https://api.github.com/repos/Japhilko/GeoData/collaborators{/collaborator}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/collaborators{/collaborator}
##                                                               teams_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/teams
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/teams
## 3            https://api.github.com/repos/Japhilko/DataGeneration/teams
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/teams
## 5                   https://api.github.com/repos/Japhilko/GeoData/teams
## 6                 https://api.github.com/repos/Japhilko/geosmdata/teams
##                                                               hooks_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/hooks
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/hooks
## 3            https://api.github.com/repos/Japhilko/DataGeneration/hooks
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/hooks
## 5                   https://api.github.com/repos/Japhilko/GeoData/hooks
## 6                 https://api.github.com/repos/Japhilko/geosmdata/hooks
##                                                                         issue_events_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/issues/events{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/issues/events{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/issues/events{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/issues/events{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/issues/events{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/issues/events{/number}
##                                                               events_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/events
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/events
## 3            https://api.github.com/repos/Japhilko/DataGeneration/events
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/events
## 5                   https://api.github.com/repos/Japhilko/GeoData/events
## 6                 https://api.github.com/repos/Japhilko/geosmdata/events
##                                                                      assignees_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/assignees{/user}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/assignees{/user}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/assignees{/user}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/assignees{/user}
## 5                   https://api.github.com/repos/Japhilko/GeoData/assignees{/user}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/assignees{/user}
##                                                                        branches_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/branches{/branch}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/branches{/branch}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/branches{/branch}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/branches{/branch}
## 5                   https://api.github.com/repos/Japhilko/GeoData/branches{/branch}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/branches{/branch}
##                                                               tags_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/tags
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/tags
## 3            https://api.github.com/repos/Japhilko/DataGeneration/tags
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/tags
## 5                   https://api.github.com/repos/Japhilko/GeoData/tags
## 6                 https://api.github.com/repos/Japhilko/geosmdata/tags
##                                                                         blobs_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/git/blobs{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/git/blobs{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/git/blobs{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/git/blobs{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/git/blobs{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/git/blobs{/sha}
##                                                                     git_tags_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/git/tags{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/git/tags{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/git/tags{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/git/tags{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/git/tags{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/git/tags{/sha}
##                                                                     git_refs_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/git/refs{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/git/refs{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/git/refs{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/git/refs{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/git/refs{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/git/refs{/sha}
##                                                                         trees_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/git/trees{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/git/trees{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/git/trees{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/git/trees{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/git/trees{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/git/trees{/sha}
##                                                                     statuses_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/statuses/{sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/statuses/{sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/statuses/{sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/statuses/{sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/statuses/{sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/statuses/{sha}
##                                                               languages_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/languages
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/languages
## 3            https://api.github.com/repos/Japhilko/DataGeneration/languages
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/languages
## 5                   https://api.github.com/repos/Japhilko/GeoData/languages
## 6                 https://api.github.com/repos/Japhilko/geosmdata/languages
##                                                               stargazers_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/stargazers
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/stargazers
## 3            https://api.github.com/repos/Japhilko/DataGeneration/stargazers
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/stargazers
## 5                   https://api.github.com/repos/Japhilko/GeoData/stargazers
## 6                 https://api.github.com/repos/Japhilko/geosmdata/stargazers
##                                                               contributors_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/contributors
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/contributors
## 3            https://api.github.com/repos/Japhilko/DataGeneration/contributors
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/contributors
## 5                   https://api.github.com/repos/Japhilko/GeoData/contributors
## 6                 https://api.github.com/repos/Japhilko/geosmdata/contributors
##                                                               subscribers_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/subscribers
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/subscribers
## 3            https://api.github.com/repos/Japhilko/DataGeneration/subscribers
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/subscribers
## 5                   https://api.github.com/repos/Japhilko/GeoData/subscribers
## 6                 https://api.github.com/repos/Japhilko/geosmdata/subscribers
##                                                               subscription_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/subscription
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/subscription
## 3            https://api.github.com/repos/Japhilko/DataGeneration/subscription
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/subscription
## 5                   https://api.github.com/repos/Japhilko/GeoData/subscription
## 6                 https://api.github.com/repos/Japhilko/geosmdata/subscription
##                                                                     commits_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/commits{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/commits{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/commits{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/commits{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/commits{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/commits{/sha}
##                                                                     git_commits_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/git/commits{/sha}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/git/commits{/sha}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/git/commits{/sha}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/git/commits{/sha}
## 5                   https://api.github.com/repos/Japhilko/GeoData/git/commits{/sha}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/git/commits{/sha}
##                                                                        comments_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/comments{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/comments{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/comments{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/comments{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/comments{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/comments{/number}
##                                                                          issue_comment_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/issues/comments{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/issues/comments{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/issues/comments{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/issues/comments{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/issues/comments{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/issues/comments{/number}
##                                                                       contents_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/contents/{+path}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/contents/{+path}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/contents/{+path}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/contents/{+path}
## 5                   https://api.github.com/repos/Japhilko/GeoData/contents/{+path}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/contents/{+path}
##                                                                               compare_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/compare/{base}...{head}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/compare/{base}...{head}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/compare/{base}...{head}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/compare/{base}...{head}
## 5                   https://api.github.com/repos/Japhilko/GeoData/compare/{base}...{head}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/compare/{base}...{head}
##                                                               merges_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/merges
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/merges
## 3            https://api.github.com/repos/Japhilko/DataGeneration/merges
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/merges
## 5                   https://api.github.com/repos/Japhilko/GeoData/merges
## 6                 https://api.github.com/repos/Japhilko/geosmdata/merges
##                                                                              archive_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/{archive_format}{/ref}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/{archive_format}{/ref}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/{archive_format}{/ref}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/{archive_format}{/ref}
## 5                   https://api.github.com/repos/Japhilko/GeoData/{archive_format}{/ref}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/{archive_format}{/ref}
##                                                               downloads_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/downloads
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/downloads
## 3            https://api.github.com/repos/Japhilko/DataGeneration/downloads
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/downloads
## 5                   https://api.github.com/repos/Japhilko/GeoData/downloads
## 6                 https://api.github.com/repos/Japhilko/geosmdata/downloads
##                                                                        issues_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/issues{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/issues{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/issues{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/issues{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/issues{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/issues{/number}
##                                                                        pulls_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/pulls{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/pulls{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/pulls{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/pulls{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/pulls{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/pulls{/number}
##                                                                        milestones_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/milestones{/number}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/milestones{/number}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/milestones{/number}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/milestones{/number}
## 5                   https://api.github.com/repos/Japhilko/GeoData/milestones{/number}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/milestones{/number}
##                                                                                         notifications_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/notifications{?since,all,participating}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/notifications{?since,all,participating}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/notifications{?since,all,participating}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/notifications{?since,all,participating}
## 5                   https://api.github.com/repos/Japhilko/GeoData/notifications{?since,all,participating}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/notifications{?since,all,participating}
##                                                                      labels_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/labels{/name}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/labels{/name}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/labels{/name}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/labels{/name}
## 5                   https://api.github.com/repos/Japhilko/GeoData/labels{/name}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/labels{/name}
##                                                                    releases_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/releases{/id}
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/releases{/id}
## 3            https://api.github.com/repos/Japhilko/DataGeneration/releases{/id}
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/releases{/id}
## 5                   https://api.github.com/repos/Japhilko/GeoData/releases{/id}
## 6                 https://api.github.com/repos/Japhilko/geosmdata/releases{/id}
##                                                               deployments_url
## 1 https://api.github.com/repos/Japhilko/2015-01-15-EMBLHeidelberg/deployments
## 2              https://api.github.com/repos/Japhilko/DataAnalysis/deployments
## 3            https://api.github.com/repos/Japhilko/DataGeneration/deployments
## 4                https://api.github.com/repos/Japhilko/DLR_IntroR/deployments
## 5                   https://api.github.com/repos/Japhilko/GeoData/deployments
## 6                 https://api.github.com/repos/Japhilko/geosmdata/deployments
##             created_at           updated_at            pushed_at
## 1 2015-01-12T15:59:33Z 2015-01-12T15:59:34Z 2015-01-10T22:26:12Z
## 2 2015-07-21T06:00:37Z 2016-02-04T13:01:54Z 2017-04-24T14:20:11Z
## 3 2014-11-11T13:14:01Z 2015-04-21T14:51:01Z 2015-07-27T13:59:39Z
## 4 2014-11-04T10:34:17Z 2016-07-26T08:22:47Z 2016-08-11T13:23:54Z
## 5 2014-06-12T08:51:41Z 2017-03-23T06:00:42Z 2017-03-23T15:31:16Z
## 6 2016-04-08T06:35:45Z 2016-06-06T10:36:01Z 2016-06-08T11:06:58Z
##                                                   git_url
## 1 git://github.com/Japhilko/2015-01-15-EMBLHeidelberg.git
## 2              git://github.com/Japhilko/DataAnalysis.git
## 3            git://github.com/Japhilko/DataGeneration.git
## 4                git://github.com/Japhilko/DLR_IntroR.git
## 5                   git://github.com/Japhilko/GeoData.git
## 6                 git://github.com/Japhilko/geosmdata.git
##                                                 ssh_url
## 1 git@github.com:Japhilko/2015-01-15-EMBLHeidelberg.git
## 2              git@github.com:Japhilko/DataAnalysis.git
## 3            git@github.com:Japhilko/DataGeneration.git
## 4                git@github.com:Japhilko/DLR_IntroR.git
## 5                   git@github.com:Japhilko/GeoData.git
## 6                 git@github.com:Japhilko/geosmdata.git
##                                                   clone_url
## 1 https://github.com/Japhilko/2015-01-15-EMBLHeidelberg.git
## 2              https://github.com/Japhilko/DataAnalysis.git
## 3            https://github.com/Japhilko/DataGeneration.git
## 4                https://github.com/Japhilko/DLR_IntroR.git
## 5                   https://github.com/Japhilko/GeoData.git
## 6                 https://github.com/Japhilko/geosmdata.git
##                                                 svn_url homepage    size
## 1 https://github.com/Japhilko/2015-01-15-EMBLHeidelberg     <NA>    5667
## 2              https://github.com/Japhilko/DataAnalysis     <NA>   55636
## 3            https://github.com/Japhilko/DataGeneration     <NA>     336
## 4                https://github.com/Japhilko/DLR_IntroR     <NA>   32546
## 5                   https://github.com/Japhilko/GeoData     <NA> 1589706
## 6                 https://github.com/Japhilko/geosmdata     <NA>   19931
##   stargazers_count watchers_count     language has_issues has_projects
## 1                0              0          TeX      FALSE         TRUE
## 2                0              0         HTML       TRUE         TRUE
## 3                0              0            R       TRUE         TRUE
## 4                2              2            R       TRUE         TRUE
## 5                6              6         HTML       TRUE         TRUE
## 6                0              0 ActionScript       TRUE         TRUE
##   has_downloads has_wiki has_pages forks_count mirror_url
## 1          TRUE     TRUE     FALSE           0         NA
## 2          TRUE     TRUE     FALSE           1         NA
## 3          TRUE     TRUE      TRUE           0         NA
## 4          TRUE     TRUE     FALSE           0         NA
## 5          TRUE     TRUE      TRUE           1         NA
## 6          TRUE     TRUE     FALSE           0         NA
##   open_issues_count forks open_issues watchers default_branch
## 1                 0     0           0        0         master
## 2                 0     1           0        0         master
## 3                 0     0           0        0         master
## 4                 0     0           0        2         master
## 5                 1     1           1        6         master
## 6                 0     0           0        0         master

Weiteres Beispiel für JSON Daten

Ergast Daten lesen

library(jsonlite)
res <- fromJSON('http://ergast.com/api/f1/2004/1/results.json')
drivers <- res$MRData$RaceTable$Races$Results[[1]]$Driver
library(DT)
datatable(drivers)

Daten der New York Times

New York Times Beispiel

article_key <- "&api-key=c2fede7bd9aea57c898f538e5ec0a1ee:6:68700045"
url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=obamacare+socialism"
req <- fromJSON(paste0(url, article_key))
articles <- req$response$docs
datatable(articles)

XML Dateien einlesen

Import von XML Dateien

install.packages("XML")
library(XML)
citation("XML")
## 
## To cite package 'XML' in publications use:
## 
##   Duncan Temple Lang and the CRAN Team (2016). XML: Tools for
##   Parsing and Generating XML Within R and S-Plus. R package
##   version 3.98-1.5. https://CRAN.R-project.org/package=XML
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {XML: Tools for Parsing and Generating XML Within R and S-Plus},
##     author = {Duncan Temple Lang and the CRAN Team},
##     year = {2016},
##     note = {R package version 3.98-1.5},
##     url = {https://CRAN.R-project.org/package=XML},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

Das R-Paket XML - Gaston Sanchez

library("XML")
Gaston Sanchez - Dataflow

Gaston Sanchez - Dataflow

Seine Arbeit sieht man hier.

Das Arbeiten mit XML Daten

Gaston Sanchez - Webdaten bekommen

Gaston Sanchez - Webdaten bekommen

Funktionen im XML Paket

Function Description
xmlName() name of the node
xmlSize() number of subnodes
xmlAttrs() named character vector of all attributes
xmlGetAttr() value of a single attribute
xmlValue() contents of a leaf node
xmlParent() name of parent node
xmlAncestors() name of ancestor nodes
getSibling() siblings to the right or to the left
xmlNamespace() the namespace (if there’s one)

Das neuere xml2 Paket

install.packages("xml2")
library(xml2)
citation("xml2")
## 
## To cite package 'xml2' in publications use:
## 
##   Hadley Wickham and James Hester (2016). xml2: Parse XML. R
##   package version 1.0.0. https://CRAN.R-project.org/package=xml2
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {xml2: Parse XML},
##     author = {Hadley Wickham and James Hester},
##     year = {2016},
##     note = {R package version 1.0.0},
##     url = {https://CRAN.R-project.org/package=xml2},
##   }

Beispiel Daten - die OpenStreetMap API

Die OpenStreetMap ID herausfinden

Einzelne Objekte finden

<www.openstreetmap.org/export>

osm export

osm export

OSM Ausschnitte herunterladen

<www.openstreetmap.org/export>

osm export

osm export

Erstes Beispiel

url <- "http://api.openstreetmap.org/api/0.6/
relation/62422"
library(xml2)
BE <- xmlParse(url)
Administrative Grenzen Berlin

Administrative Grenzen Berlin

Das XML analysieren

xmltop = xmlRoot(BE)
class(xmltop)
## [1] "XMLInternalElementNode" "XMLInternalNode"       
## [3] "XMLAbstractNode"
xmlSize(xmltop)
## [1] 1
xmlSize(xmltop[[1]])
## [1] 328

Nutzung von Xpath

Xpath, the XML Path Language, is a query language for selecting nodes from an XML document.

xpathApply(BE,"//tag[@k = 'source:population']")
## [[1]]
## <tag k="source:population" v="http://www.statistik-berlin-brandenburg.de/Publikationen/Stat_Berichte/2010/SB_A1-1_A2-4_q01-10_BE.pdf 2010-10-01"/> 
## 
## attr(,"class")
## [1] "XMLNodeSet"

Beispiel: administrative Grenzen Berlin

Administrative Grenzen für Deutschland

url <- "http://api.openstreetmap.org/api/0.6/relation/62422"
BE <- xmlParse(url)
Administrative Grenzen Berlin

Administrative Grenzen Berlin

Quelle für die Bevölkerungsgröße

xpathApply(BE,"//tag[@k = 'source:population']")
## [[1]]
## <tag k="source:population" v="http://www.statistik-berlin-brandenburg.de/Publikationen/Stat_Berichte/2010/SB_A1-1_A2-4_q01-10_BE.pdf 2010-10-01"/> 
## 
## attr(,"class")
## [1] "XMLNodeSet"

-Statistik Berlin Brandenburg

Etwas überraschend:

xpathApply(BE,"//tag[@k = 'name:ta']")
## [[1]]
## <tag k="name:ta" v="<U+0BAA><U+0BC6><U+0BB0><U+0BCD><U+0BB2><U+0BBF><U+0BA9><U+0BCD>"/> 
## 
## attr(,"class")
## [1] "XMLNodeSet"

Geographische Region

region <- xpathApply(BE,
  "//tag[@k = 'geographical_region']")
# regular expressions
region[[1]]
## <tag k="geographical_region" v="Barnim;Berliner Urstromtal;Teltow;Nauener Platte"/>
<tag k="geographical_region" 
  v="Barnim;Berliner Urstromtal;
  Teltow;Nauener Platte"/>

Landkreis

Barnim

Barnim

Weiteres Beispiel

url2<-"http://api.openstreetmap.org/api/0.6/node/25113879"
obj2<-xmlParse(url2)
obj_amenity<-xpathApply(obj2,"//tag[@k = 'amenity']")[[1]]
obj_amenity
## <tag k="amenity" v="university"/>

Wikipedia Artikel

xpathApply(obj2,"//tag[@k = 'wikipedia']")[[1]]
## <tag k="wikipedia" v="de:Universität Mannheim"/>
xpathApply(obj2,"//tag[@k = 'wheelchair']")[[1]]
## <tag k="wheelchair" v="limited"/>
xpathApply(obj2,"//tag[@k = 'name']")[[1]]
## <tag k="name" v="Universität Mannheim"/>

Das C und das A

url3<-"http://api.openstreetmap.org/api/0.6/node/303550876"
obj3 <- xmlParse(url3)
xpathApply(obj3,"//tag[@k = 'opening_hours']")[[1]]
## <tag k="opening_hours" v="Mo-Sa 09:00-20:00; Su,PH off"/>

Nur Fliegen ist schöner

url5<-"http://api.openstreetmap.org/api/0.6/way/162149882"
obj5<-xmlParse(url5)
xpathApply(obj5,"//tag[@k = 'name']")[[1]]
## <tag k="name" v="City-Airport Mannheim"/>
xpathApply(obj5,"//tag[@k = 'website']")[[1]]
## <tag k="website" v="http://www.flugplatz-mannheim.de/"/>
xpathApply(obj5,"//tag[@k = 'iata']")[[1]]
## <tag k="iata" v="MHG"/>

Einen Punkt parsen

url2 <- "http://api.openstreetmap.org/api/0.6/node/2923760808"
RennesBa <- xmlParse(url2)
RennesBa
## <?xml version="1.0" encoding="UTF-8"?>
## <osm version="0.6" generator="CGImap 0.6.0 (3583 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
##   <node id="2923760808" visible="true" version="7" changeset="47392918" timestamp="2017-04-02T20:42:05Z" user="FrShaft" uid="2377664" lat="48.1068780" lon="-1.6730415">
##     <tag k="addr:city" v="Rennes"/>
##     <tag k="addr:country" v="FR"/>
##     <tag k="addr:housenumber" v="25"/>
##     <tag k="addr:postcode" v="35000"/>
##     <tag k="addr:street" v="Avenue Jean Janvier"/>
##     <tag k="amenity" v="restaurant"/>
##     <tag k="capacity" v="90"/>
##     <tag k="name" v="Il Basilico"/>
##     <tag k="source:addr:housenumber" v="Rennes Métropole"/>
##     <tag k="source:addr:housenumber:ref" v="66075"/>
##     <tag k="source:addr:housenumber:version" v="2013-04-02"/>
##     <tag k="website" v="http://ilbasilico.fr"/>
##     <tag k="wheelchair" v="limited"/>
##     <tag k="wheelchair:description" v="Aucune sonnette pour indiquer sa présence mais une rampe d'accès peut être déployée."/>
##   </node>
## </osm>
## 

Einen Weg parsen

url3 <- "http://api.openstreetmap.org/api/0.6/way/72799743"
MadCalle <- xmlParse(url3)
MadCalle
## <?xml version="1.0" encoding="UTF-8"?>
## <osm version="0.6" generator="CGImap 0.6.0 (31542 thorn-01.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
##   <way id="72799743" visible="true" version="5" changeset="11915713" timestamp="2012-06-16T14:49:40Z" user="Montgomery" uid="211405">
##     <nd ref="869268876"/>
##     <nd ref="1790008568"/>
##     <nd ref="864117544"/>
##     <nd ref="1790008571"/>
##     <nd ref="1790008601"/>
##     <nd ref="864117511"/>
##     <nd ref="1790008612"/>
##     <nd ref="1790008618"/>
##     <nd ref="864117819"/>
##     <tag k="highway" v="residential"/>
##     <tag k="name" v="Calle Alfonso Ercilla"/>
##     <tag k="oneway" v="yes"/>
##     <tag k="surface" v="asphalt"/>
##   </way>
## </osm>
## 

The Overpass API

Logo Overpass API

Logo Overpass API

The Overpass API is a read-only API that serves up custom selected parts of the OSM map data.

(http://wiki.openstreetmap.org/wiki/Overpass_API)

Wichtige Information

http://wiki.openstreetmap.org/wiki/Map_Features

osm map features

osm map features

Beispiel: Nutzung der Overpass API

Spielplätze Mannheim

Spielplätze Mannheim

Export der Rohdaten

Export Rohdaten

Export Rohdaten

Import von der Overpass API zu R

library(XML)
place <- "Mannheim"
type_obj <- "node"
object <- "leisure=playground"

InfoList <- xmlParse(paste(Link1,place,"\"];",
type_obj,"(area)[",object,"];out;",sep=""))

XML Output

Spielplätze in Mannheim

Spielplätze in Mannheim

Das Arbeiten mit XML Daten (xpath)

Die Liste der ID’s mit dem Wert playground:

node_id <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ id")
## node_id[[1]]
Erste node id

Erste node id

latitude und longitude bekommen

lat_x <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ lat")
# lat_x[[1]];lat_x[[2]]
lat_x <- xpathApply(InfoList,
"//tag[@v= 'playground']/parent::node/@ lon")
Latitude Koordinate

Latitude Koordinate

Paket auf Github

library(devtools)
install_github("Japhilko/gosmd")
library(gosmd)
pg_MA <- get_osm_nodes(object="leisure=playground",
                       "Mannheim")
info <- extract_osm_nodes(OSM.Data=pg_MA,
                          value="playground")

Ausschnitt der Ergebnisse

leisure lat lon note
30560755 playground 49.51910 8.502807 NA
76468450 playground 49.49633 8.539396 Rutsche, Schaukel, großer Sandkasten, Tischtennis
76468534 playground 49.49678 8.552959 NA
76468535 playground 49.49230 8.548750 NA
76468536 playground 49.50243 8.548140 Schaukel, Rutsche, Sandkasten, Spielhäuser, Tischtennis
76468558 playground 49.49759 8.542036 NA

Mehr Beispiele, wie man mit XML Daten umgeht:

Noch mehr Informationen

Die Pakete rvest und RCurl

Das Paket rvest

install.packages("rvest")
library(rvest)
ht <- read_html('https://www.google.co.in/search?q=guitar+repair+workshop')
links <- ht %>% html_nodes(xpath='//h3/a') %>% html_attr('href')
gsub('/url\\?q=','',sapply(strsplit(links[as.vector(grep('url',links))],split='&'),'[',1))
## [1] "http://theguitarrepairworkshop.com/"                                                                   
## [2] "http://www.guitarservices.com/"                                                                        
## [3] "http://www.guitarrepairbench.com/guitar-building-projects/guitar-workshop/guitar-workshop-project.html"
## [4] "https://www.facebook.com/The-Guitar-Repair-Workshop-847517635259712/"                                  
## [5] "https://www.taylorguitars.com/dealer/guitar-repair-workshop-ltd"                                       
## [6] "http://www.laweekly.com/music/10-best-guitar-repair-shops-in-los-angeles-4647166"                      
## [7] "http://guitarworkshopglasgow.com/pages/repairs-1"                                                      
## [8] "https://www.justdial.com/Mumbai/Guitar-Repair-Services/nct-10988623"                                   
## [9] "https://www.justdial.com/Delhi-NCR/Guitar-Repair-Services/nct-10988623"

Hin und weg

url4<-"http://api.openstreetmap.org/api/0.6/node/25439439"
obj4 <- xmlParse(url4)
xpathApply(obj4,"//tag[@k = 'railway:station_category']")[[1]]
## <tag k="railway:station_category" v="2"/>

Exkurs: Bahnhofskategorien

library(rvest)
bhfkat<-read_html(
  "https://de.wikipedia.org/wiki/Bahnhofskategorie")
df_html_bhfkat<-html_table(
  html_nodes(bhfkat, "table")[[1]],fill = TRUE)

Bahnhofskategorien Übersicht

Stufe Bahnsteigkanten Bahnsteiglänge Reisende/Tag Zughalte/Tag
6 01 > 000 bis 090 m 00000 bis 00049 000 bis 0010
5 02 > 090 bis 140 m 00050 bis 00299 011 bis 0050
4 03 bis 04 > 140 bis 170 m 00300 bis 00999 051 bis 0100
3 05 bis 09 > 170 bis 210 m 01000 bis 09999 101 bis 0500
2 10 bis 14 > 210 bis 280 m 10.000 bis 49.999 501 bis 1000
1 00i ab 15 > 280 m 00000i ab 50.000 000i ab 1001

Webscraping

Notwendige Pakete

install.packages("tidyverse")
library(tidyverse)

Weitere benötigte Pakete

library(stringr)
library(forcats)
library(ggmap)
library(rvest)

Daten von Wikipedia einsammeln

html.world_ports <- read_html("https://en.wikipedia.org/wiki/List_of_busiest_container_ports")
df.world_ports <- html_table(html_nodes(html.world_ports, "table")[[2]], fill = TRUE)
library(DT)
datatable(df.world_ports)

Die Daten anschauen

glimpse(df.world_ports)
## Observations: 50
## Variables: 15
## $ Rank     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ Port     <chr> "Shanghai", "Singapore", "Shenzhen", "Ningbo-Zhoushan...
## $ Economy  <chr> "China", "Singapore", "China", "China", "Hong Kong", ...
## $ 2015[1]  <chr> "36,516", "30,922", "24,142", "20,636", "20,073", "19...
## $ 2014[2]  <chr> "35,268", "33,869", "23,798", "19,450", "22,374", "18...
## $ 2013[3]  <chr> "33,617", "32,240", "23,280", "17,351", "22,352", "17...
## $ 2012[4]  <chr> "32,529", "31,649", "22,940", "16,670", "23,117", "17...
## $ 2011[5]  <chr> "31,700", "29,937", "22,570", "14,686", "24,384", "16...
## $ 2010[6]  <chr> "29,069", "28,431", "22,510", "13,144", "23,532", "14...
## $ 2009[7]  <chr> "25,002", "25,866", "18,250", "10,502", "20,983", "11...
## $ 2008[8]  <chr> "27,980", "29,918", "21,414", "11,226", "24,248", "13...
## $ 2007[9]  <chr> "26,150", "27,932", "21,099", "9,349", "23,881", "13,...
## $ 2006[10] <chr> "21,710", "24,792", "18,469", "7,068", "23,539", "12,...
## $ 2005[11] <chr> "18,084", "23,192", "16,197", "5,208", "22,427", "11,...
## $ 2004[12] <chr> "14,557", "21,329", "13,615", "4,006", "21,984", "11,...

Das Paket rvest

library(rvest)
ht <- read_html('https://www.google.co.in/search?q=guitar+repair+workshop')
links <- ht %>% html_nodes(xpath='//h3/a') %>% html_attr('href')
gsub('/url\\?q=','',sapply(strsplit(links[as.vector(grep('url',links))],split='&'),'[',1))
## [1] "http://theguitarrepairworkshop.com/"                                                                   
## [2] "http://www.guitarservices.com/"                                                                        
## [3] "http://www.guitarrepairbench.com/guitar-building-projects/guitar-workshop/guitar-workshop-project.html"
## [4] "https://www.facebook.com/The-Guitar-Repair-Workshop-847517635259712/"                                  
## [5] "https://www.taylorguitars.com/dealer/guitar-repair-workshop-ltd"                                       
## [6] "http://www.laweekly.com/music/10-best-guitar-repair-shops-in-los-angeles-4647166"                      
## [7] "http://guitarworkshopglasgow.com/pages/repairs-1"                                                      
## [8] "https://www.justdial.com/Mumbai/Guitar-Repair-Services/nct-10988623"                                   
## [9] "https://www.justdial.com/Delhi-NCR/Guitar-Repair-Services/nct-10988623"

Use Case - Scraping Wikipedia

Einleitung

Im Folgenden werde ich zeigen, wie man Textinformationen aus Wikipedia herunterladen, verarbeiten und analysieren kann.

install.packages("NLP")
install.packages("tm")
install.packages("FactoMineR")

Die verwendeten Pakete

library("stringi")
library("tm")
library("FactoMineR")

Die Text Daten herunterladen

wiki <- "http://de.wikipedia.org/wiki/"

titles <- c("Zika-Virus", "Influenza-A-Virus_H1N1", 
            "Spanische_Grippe","Influenzavirus",
            "Vogelgrippe_H5N1",
            "Legionellose-Ausbruch_in_Warstein_2013",
            "Legionellose-Ausbruch_in_Jülich_2014")

Das Herunterladen der Seiten

articles <- character(length(titles))

for (i in 1:length(titles)){
    articles[i] <- stri_flatten(
      readLines(stri_paste(wiki, titles[i])), col = " ")
}

docs <- Corpus(VectorSource(articles))

Die Daten vorbereiten

Das Folgende basiert auf einem Blogpost von Norbert Ryciak über die automatische Kategorisierung von Wikipedia-Artikeln.

docs2 <- tm_map(docs, function(x) stri_replace_all_regex(
  x, "<.+?>", " "))
docs3 <- tm_map(docs2, function(x) stri_replace_all_fixed(
  x, "\t", " "))

Den Text weiterverarbeiten

docs4 <- tm_map(docs3, PlainTextDocument)
docs5 <- tm_map(docs4, stripWhitespace)
docs6 <- tm_map(docs5, removeWords, stopwords("german"))
docs7 <- tm_map(docs6, removePunctuation)
docs8 <- tm_map(docs7, tolower)
# docs8 <- tm_map(docs8, PlainTextDocument)
dtm <- DocumentTermMatrix(docs8)  

Principal Component Analysis

dtm2 <- as.matrix(dtm)
frequency <- colSums(dtm2)
frequency <- sort(frequency, decreasing=TRUE)
words <- frequency[frequency>20]
s <- dtm2[1,which(colnames(dtm2) %in% names(words))]

for(i in 2:nrow(dtm2)){
  s <- cbind(s,dtm2[i,which(colnames(dtm2) %in% 
                              names(words))])
} 

colnames(s) <- titles

Ergebnis

PCA(s)

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 125 individuals, described by 7 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

Ergebnis

Das Dendogramm

s0 <- s/apply(s,1,sd)
h <- hclust(dist(t(s0)), method = "ward")

plot(h, labels = titles, sub = "")

Shiny Apps

Das shiny Paket installieren

install.packages("shiny")

Wer hat’s erfunden?

citation("shiny")
## 
## To cite package 'shiny' in publications use:
## 
##   Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan
##   McPherson (2017). shiny: Web Application Framework for R. R
##   package version 1.0.1. https://CRAN.R-project.org/package=shiny
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {shiny: Web Application Framework for R},
##     author = {Winston Chang and Joe Cheng and JJ Allaire and Yihui Xie and Jonathan McPherson},
##     year = {2017},
##     note = {R package version 1.0.1},
##     url = {https://CRAN.R-project.org/package=shiny},
##   }

Eine erste Beispielapp

library(shiny)
runExample("01_hello")

Der Start

Dem Kind einen Namen geben

Die erste App

Zur Erklärung

Eine zweite Beispiel App

library(shiny)
runExample("02_text")

Einführung in Shiny

R und Git

Rstudio und git - ein Projekt anlegen

Ein Projekt mit Versionskontrolle

Auswahl Versionskontrolle

Ein Projekt clonen

Der git-Reiter in Rstudio

Aktuelle eigene Änderungen committen

Der übliche Ablauf

Commands

git commit
git push

http://stackoverflow.com/questions/1125968/force-git-to-overwrite-local-files-on-pull

Problems with disk space

WinDirStat https://support.microsoft.com/de-de/kb/912997 http://www.pcwelt.de/tipps/Update-Dateien-loeschen-8357046.html

Quelle für Pakete

Ein Paket von Github installieren

install.packages("devtools")
library(devtools)
install_github("Japhilko/gosmd")

Datensätze Suchfunktion

Git und Rstudio

C++ Integration - Überblick über das Paket rcpp

Warum die Integration von c++

Robert Gentleman, in R Programming for Bioinformatics, 2008, about R’s built-in C interfaces:

Since R is not compiled, in some situations its performance can be substantially improved by writing code in a compiled language. There are also reasons not to write code in other languages, and in particular we caution against premature optimization, prototyping in R is often cost effective. And in our experience very few routines need to be implemented in other languages for effiiency reasons. Another substantial reason not to use an implementation in some other language is increased complexity. The use of another language almost always results in higher maintenance costs and less stability. In addition, any extensions or enhancements of the code will require someone that is proficient in both R and the other language.

Warum und wann?

Voraussetzung Compiler

Für Windows, Rtools

  1. http://cran.r-project.org/bin/windows/Rtools/
  2. http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset

Für Mac, Xcode

  1. http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-_0028Mac_0029-OS-X
  2. http://cran.r-project.org/doc/manuals/R-admin.html#Mac-OS-X

Was wir nutzen werden

Wir werden die folgenden beiden Pakete nutzen:

Rcpp

Einleitung

Das R-Paket CPP

install.packages("Rcpp")
library(Rcpp)
cppFunction('int add(int x, int y, int z) {
  int sum = x + y + z;
  return sum;
}')
# add works like a regular R function
add
add(1, 2, 3)

Rcpp

Tutorial on Rcpp by Hadley Wickham

library(Rcpp)
cppFunction('int add(int x, int y, int z) {
  int sum = x + y + z;
  return sum;
}')
add(1, 2, 3)

Benchmarking

install.packages("microbenchmark")
library(microbenchmark)

Resourcen

Überblick über Möglichkeiten des Parallel Computings - Paket parallel

Datenbanken und R

Was sind Datenbanken?

Wann sollte man R um Datenbanken ergänzen?

Man nutzt die Schnittstelle zu Datenbanken,…

Die drei großen Open-Source Datenbanken

sqlite

mysql Datenbank

PostgreSQL

Vergleich zwischen MySQL und PostgreSQL

Beispiel zu relationalen Datenbanken

Was ist der Unterschied zwischen SQL und NoSQL

MongoDB

CouchDB

Podcast zu CouchDB

Quick-R zur Integration von Datenbanken

SQL lernen…

Weitergehendes Lernen

Weitere Resourcen

Das R-Paket dplyr

Das Paket dplyr

install.packages("nycflights13")
library(nycflights13)
dim(flights)
## [1] 336776     19
head(flights)
## # A tibble: 6 × 19
##    year month   day dep_time sched_dep_time dep_delay arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>
## 1  2013     1     1      517            515         2      830
## 2  2013     1     1      533            529         4      850
## 3  2013     1     1      542            540         2      923
## 4  2013     1     1      544            545        -1     1004
## 5  2013     1     1      554            600        -6      812
## 6  2013     1     1      554            558        -4      740
## # ... with 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## #   time_hour <dttm>

Die Reihen filtern mit filter()

library(dplyr)
head(filter(flights, month == 1,day==1))
## # A tibble: 6 × 19
##    year month   day dep_time sched_dep_time dep_delay arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>
## 1  2013     1     1      517            515         2      830
## 2  2013     1     1      533            529         4      850
## 3  2013     1     1      542            540         2      923
## 4  2013     1     1      544            545        -1     1004
## 5  2013     1     1      554            600        -6      812
## 6  2013     1     1      554            558        -4      740
## # ... with 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## #   time_hour <dttm>

Erste Schritte mit dplyr

install.packages("downloader")
library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/msleep_ggplot2.csv"
filename <- "msleep_ggplot2.csv"
if (!file.exists(filename)) download(url,filename)
msleep <- read.csv("msleep_ggplot2.csv")
head(msleep)
##                         name      genus  vore        order conservation
## 1                    Cheetah   Acinonyx carni    Carnivora           lc
## 2                 Owl monkey      Aotus  omni     Primates         <NA>
## 3            Mountain beaver Aplodontia herbi     Rodentia           nt
## 4 Greater short-tailed shrew    Blarina  omni Soricomorpha           lc
## 5                        Cow        Bos herbi Artiodactyla domesticated
## 6           Three-toed sloth   Bradypus herbi       Pilosa         <NA>
##   sleep_total sleep_rem sleep_cycle awake brainwt  bodywt
## 1        12.1        NA          NA  11.9      NA  50.000
## 2        17.0       1.8          NA   7.0 0.01550   0.480
## 3        14.4       2.4          NA   9.6      NA   1.350
## 4        14.9       2.3   0.1333333   9.1 0.00029   0.019
## 5         4.0       0.7   0.6666667  20.0 0.42300 600.000
## 6        14.4       2.2   0.7666667   9.6      NA   3.850
sleepData <- select(msleep, name, sleep_total)
head(sleepData)
##                         name sleep_total
## 1                    Cheetah        12.1
## 2                 Owl monkey        17.0
## 3            Mountain beaver        14.4
## 4 Greater short-tailed shrew        14.9
## 5                        Cow         4.0
## 6           Three-toed sloth        14.4

Integration von PostgreSQL mit dem Paket

RPostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL installieren

PG admin installieren

Wie bekomme ich Daten in die Datenbank

# install.packages("RPostgreSQL")
library("RPostgreSQL")

Geodaten in die Datenbank migrieren

sudo -u postgres createuser Japhilko
sudo -u postgres createdb -E UTF8 -O Japhilko offlgeoc

Die postgis Erweiterung muss für die Datenbank installiert werden:

CREATE EXTENSION postgis;

Programm zum Import der OSM Daten in PostgreSQL- osm2pgsql

osm2pgsql -c -d osmBerlin --slim -C  -k  berlin-latest.osm.pbf

Erweiterung hstore

CREATE EXTENSION hstore;
osm2pgsql -s -U postgres -d offlgeoc /home/kolb/Forschung/osmData/data/saarland-latest.osm.pbf 

Datenbank für Geocoding

sudo -u postgres createdb -E UTF8 -O Japhilko offlgeocRLP
CREATE EXTENSION postgis;
osm2pgsql -s -U postgres -d offlgeocRLP -o gazetteer /home/kolb/Forschung/osmData/data/rheinland-pfalz-latest.osm.pbf 

So bekommt man alle administrativen Grenzen:

SELECT name FROM planet_osm_polygon WHERE boundary='administrative'

Zurück zu R

pw <- {"1234"}
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "offlgeocRLP",
                 host = "localhost", port = 5432,
                 user = "postgres", password = pw)
rm(pw) # removes the password

dbExistsTable(con, "planet_osm_polygon")
df_postgres <- dbGetQuery(con, "SELECT name, admin_level FROM planet_osm_polygon WHERE boundary='administrative'")
barplot(table(df_postgres[,2]),col="royalblue")

df_adm8 <- dbGetQuery(con, "SELECT name, admin_level FROM planet_osm_polygon WHERE boundary='administrative' AND admin_level='8'")
library(knitr)
# kable(head(df_adm8))

df_hnr <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point 
WHERE planet_osm_line.name='Nordring' AND planet_osm_line.highway IN ('motorway','trunk','primary')
AND planet_osm_point.name='Ludwigshafen' AND planet_osm_point.place IN ('city', 'town')
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
df_hnr <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point 
WHERE planet_osm_line.name='Nordring' AND planet_osm_point.name='Ludwigshafen' 
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_hnr)
df_ <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point 
WHERE planet_osm_line.name='Nordring' AND planet_osm_point.name='Ludwigshafen' 
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_hnr)
colnames(df_)
table(df_$name)

Adresse in einem Ort

df_sipp <- dbGetQuery(con, "SELECT * FROM planet_osm_line, planet_osm_point 
WHERE planet_osm_line.name='Rechweg' AND planet_osm_point.name='Sippersfeld' 
ORDER BY ST_Distance(planet_osm_line.way, planet_osm_point.way)")
head(df_sipp)

OpenStreetMap und Open Government Data in PostGIS

restnam <- dbGetQuery(con, "SELECT name, COUNT(osm_id) AS anzahl
FROM planet_osm_point
WHERE amenity = 'restaurant'
  AND name <> ''
GROUP BY name
ORDER BY anzahl DESC
LIMIT 10")
head(restnam)

PostgreSQL and Leaflet

install.packages("plot3D")
library(plot3D)
library(RPostgreSQL)

RMySQL

install.packages("RMySQL")

Nutzung von weiteren Datenbanken (MongoDB, MySQL)